Computational aspects of mining maximal frequent patterns

نویسنده

  • Guizhen Yang
چکیده

In this paper we study the complexity-theoretic aspects of mining maximal frequent patterns, from the perspective of counting the number of all distinct solutions. We present the first formal proof that the problem of counting the number of maximal frequent itemsets in a database of transactions, given an arbitrary support threshold, is #P-complete, thereby providing theoretical evidence that the problem of mining maximal frequent itemsets is NP-hard. We also extend our complexity analysis to other similar data mining problems that deal with complex data structures, such as sequences, trees, and graphs. We investigate several variants of these mining problems in which the patterns of interest are subsequences, subtrees, or subgraphs, and show that the associated problems of counting the number of maximal frequent patterns are all either #P-complete or #P-hard.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Efficient Approach to Mining Maximal Contiguous Frequent Patterns from Large DNA Sequence Databases

Mining interesting patterns from DNA sequences is one of the most challenging tasks in bioinformatics and computational biology. Maximal contiguous frequent patterns are preferable for expressing the function and structure of DNA sequences and hence can capture the common data characteristics among related sequences. Biologists are interested in finding frequent orderly arrangements of motifs t...

متن کامل

MaRFI: Maximal Regular Frequent Itemset Mining using a pair of Transaction-ids

Frequent pattern mining is the fundamental and most dominant research area in data mining. Maximal frequent patterns are one of the compact representations of frequent itemsets. There is more number of algorithms to find maximal frequent patterns that are suitable for mining transactional databases. Users not only interested in occurrence frequency but may be interested on frequent patterns tha...

متن کامل

Max-FTP: Mining Maximal Fault-Tolerant Frequent Patterns from Databases

Mining Fault-Tolerant (FT) Frequent Patterns in real world (dirty) databases is considered to be a fruitful direction for future data mining research. In last couple of years a number of different algorithms have been proposed on the basis of Apriori-FT frequent pattern mining concept. The main limitation of these existing FT frequent pattern mining algorithms is that, they try to find all FT f...

متن کامل

MACFP: Maximal Approximate Consecutive Frequent Pattern Mining under Edit Distance

Consecutive pattern mining aiming at finding sequential patterns substrings, is a special case of frequent pattern mining and has been played a crucial role in many real world applications, especially in biological sequence analysis, time series analysis, and network log mining. Approximations, including insertions, deletions, and substitutions, between strings are widely used in biological seq...

متن کامل

Efficient Mining of Length-Maximal Flock Patterns from Large Trajectory Data

In this paper, we study the problem of mining a class of spatio-temporal patterns, called flock patterns, which represent a groups of moving objects close each other in a given time segment (Gudmundsson and van Kreveld, Proc. ACM GIS’06; Benkert, Gudmundsson, Hubner, Wolle, Computational Geometry, 41:11, 2008). Based on frequent-pattern mining approach, such as Apriori, Eclat, or LCM, we presen...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Theor. Comput. Sci.

دوره 362  شماره 

صفحات  -

تاریخ انتشار 2006